home *** CD-ROM | disk | FTP | other *** search
- shtml: SHell hTML converter, by Chris Underwood.
-
- email: csuwz@csv.warwick.ac.uk
- www: http://www.csv.warick.ac.uk/~csuwz/
-
- The scenario:
-
- "Hey, what I need is a utility that allows me to read .html files easily and
- with low memory overheads. It sounds like an easy enough shell utility. I'll
- just check the Aminet."
-
- A few downloads later:
-
- "Well, none of them worked... That one didn't seem to do anything, that one
- didn't seem to get any of the tags right and cut things rather badly. that one
- made horrid thrashing noises, scrambled the screen and then software failed
- with uncalled-for violence, and that one invalidated my hard drive. Something
- I don't take kindly to. Darn, I'll just have to learn C and write myself one
- that works!"
-
- So I did. And for lack of a better name, I called it shtml (not that it has
- much to do with secure protocols, I just like the sound of it).
-
- A better discription:
-
- shtml is a small ~15k program written in C, and compiled by the oddest
- integrated environment since original AMOS, SAS/C. Don't read oddest=worst
- because I actually quite like it (I quite liked AMOS too).
- Anyhow, shtml takes an html file (world wide web page) and formats it so that
- it is readable as a pure ASCII text file. If none of that makes any sense to
- you then you probably won't find a use for it. In this case, go hunting on the
- Aminet for some of the other stuff I've written.
-
- Installing it:
-
- shtml fits perfectly in your c: directory and that's about the only
- installation it needs. It can now be used like any other command.
-
- Using it:
-
- Type at a shell prompt:
-
- shtml ?
-
- for a command synopsys (usage chart). Just to be akward, I have written this in
- Bakus Nauer form (I doubt that's how it's spelled but I don't much care). The
- command synopsys given looks like:
-
- Usage: shtml (-h|-help)|([-p|=pure] infile.html [outfile.txt])
-
- | means 'one or the other', so x|y means type x or y. Not both. Stuff in square
- brackets [] means 'optional' and is not required, but can be used as a switch
- to turn on or off a certain effect. Standard brackets () are to group a set
- of options and are not literal. Therefore, you can type any of the following,
- assuming that infile.html exists and the path to outfile.txt also exists:
-
- shtml ?
- shtml -h
- shtml -help
- shtml infile.html
- shtml infile.html outfile.txt
- shtml -p infile.html
- shtml -p infile.html outfile.txt
- shtml -pure infile.html
- shtml -pure infile.html outfile.txt
-
- -h and -help both mean the same. They tell shtml to print a fairly short text
- explaining itself and how it is used.
-
- -p and -pure are also equivilent. The pure flag is used to ask shtml not to
- print any text inversion charecters. Normally text is inverted for titles,
- hrefs and the irritating blink tag. -pure surpresses this change.
-
- infile.html: This is the name (and path if necessary) of the source html file.
- You could feed shtml a simple text document and watch it get formatted to the
- screen's width if you needed.
-
- outfile.txt: This is the name and path of the file to write the output to. If
- no file is specified then output goes to stdout (normally the screen). If the
- file does not exist then it is created. If the file already exists, it is
- overwritten without warning.
-
- I have included the file "shtml_me.html" whith which to test this program.
- Just type something like "shtml shtml_me.html" in a shell and watch the output.
- Type: "type shtml_me.html" to see what the file originally looks like, and why
- the shtml command is useful.
-
- History:
-
- (Like anyone cares about history...)
-
- Version 1. What you have with you. It took about 3 days of lazy work to
- develop and wasn't it worth it? Well, you decide.
-
- Future:
-
- Anyone got any good ideas (constructive please - I do a good critical,
- cynical, sarcastic job myself). Anything useful and code-able I'll probably
- get round to doing eventually.
- On this note, Thanks to Rich Neal for pointing out that comments weren't being
- ignored if they had tags nested in them. 5 minuites of coding later they
- were completly ignored as they should be.
-
- Something that this thing does need is a width specifier. At the moment it is
- hard wired into the code with a #define (and it still works if it is
- recompiled with a different value - if you need it, do it). It would be quite
- easy to make it specifiable in a command line optionbut I can't be arsed atm.
- Next revision probably :)
-
- I know the <center> tag isn't acted upon, but it would require quite some
- re-writing to make this work. It would be nice but it isn't really worth my
- effort. Wanna know why it won't be an easy thing to impliment? Well, read
- on if you could care less:
-
- How it actually works (algorithm and other bad ideas :)
-
- The incomming stream of charecters is split into a stream of words that are
- handled instantly. A word is either a tag, or it is a string of charecters
- separated by whitespace (Space, Return and Tab).
- Each word is checked to see if it is a tag. If it is, it is sent to a tag
- processor that handles stuff like <br> and <pre>. Any un-recognised tags
- are simply ignored and not printed. If a tag such as <title> comes along, the
- inverse text code is sent to the outfile. If </title> is then seen, the normal
- text mode is entered again.
- If the word is not a tag it is simply sent straight to the outfile (or stdout)
- with an added space. Before a word is sent, a simple check is made to see if
- it fits on the current line. If it doesn't, a newline charecter is sent first.
- If the <center> tag where to be implimented the program would need to know
- how far to indent the left hand edge before it started printing. To determine
- this value, it would need to know the used space on the current line. To know
- this value it has to write the line. By the time it is finished it is too late
- to indent the left hand edge because the line has been written already. Bummer,
- huh?
- I could get round this by buffering the current line of output but this was
- more trouble than it was worth and I wrote it off at an early stage. (just
- before the first usable working version :)
-
- bugs:
- Well, there shouldn't be too many of these, but some files don't seem to get
- correctly formatted at all. I have no idea why. If anyone has a solution then
- please contact me. Most things work perfectly though.
-
- This program doesn't need much in the way of memory, but it might do something
- slightly unsavoury like crashing if it runs out. It might just exit cleanly
- mind, I don't know. I've never had that little memory except in my head!
-
- I have placed both the compiled (runnable) binary and the ASCII source code
- in this archive. Feel free to change/recompile if necessary but do not
- redistribute any changed versions. If you make a stunning new feature for
- this, mail it to me and I'll make a new version if it really is stunning. I'll
- credit any new code or ideas of course.
-
- Well, I don't know if you're bored reading all this cak, but I sure am bored
- of writing it. If you have any more problems with this, mail me or check out
- the code (learn C if necessary).
-
- CMU
-